Broom: Sweeping Out Garbage Collection from Big Data Systems
نویسندگان
چکیده
Many popular systems for processing “big data” are implemented in high-level programming languages with automatic memory management via garbage collection (GC). However, high object churn and large heap sizes put severe strain on the garbage collector. As a result, applications underperform significantly: GC increases the runtime of typical data processing tasks by up to 40%. We propose to use region-based memory management instead of GC in distributed data processing systems. In these systems, many objects have clearly defined lifetimes. Hence, it is natural to allocate these objects in fate-sharing regions, obviating the need to scan a large heap. Regions can be memory-safe and could be inferred automatically. Our initial results show that regionbased memory management reduces emulated Naiad vertex runtime by 34% for typical data analytics jobs.
منابع مشابه
Reducing Sweep Time for a Nearly Empty Heap
Mark and sweep garbage collectors are known for using time proportional to the heap size when sweeping memory, since all objects in the heap, regardless of whether they are live or not, must be visited in order to reclaim the memory occupied by dead objects. This paper introduces a sweeping method which traverses only the live objects, so that sweeping can be done in time dependent only on the ...
متن کاملThe Active Memory Module: Bounded Garbage Collection Latency for Real-time Embedded Systems
In this paper, the performance analysis of the proposed Active Memory Module (AMM) for embedded systems is presented. Unlike the software counterparts, the AMM can perform a memory allocation in a predictable and bounded fashion (5 cycles). Moreover, it can also yield a bounded sweeping time regardless of the number of live objects or heap size. By utilizing the proposed system, the overall spe...
متن کاملYak: A High-Performance Big-Data-Friendly Garbage Collector
Most “Big Data” systems are written in managed languages, such as Java, C#, or Scala. These systems suffer from severe memory problems due to the massive volume of objects created to process input data. Allocating and deallocating a sea of data objects puts a severe strain on existing garbage collectors (GC), leading to high memory management overheads and reduced performance. This paper descri...
متن کاملActive Memory Processor: A Hardware Garbage Collector for Real-Time Java Embedded Devices
Java possesses many advantages for embedded system development, including fast product deployment, portability, security, and a small memory footprint. As Java makes inroads into the market for embedded systems, much effort is being invested in designing real-time garbage collectors. The proposed garbage-collected memory module, a bitmap-based processor with standard DRAM cells is introduced to...
متن کاملHow Data Volume Affects Spark Based Data Analytics on a Scale-up Server
Sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark is gaining popularity for exhibiting superior scale-out performance on the commodity machines, the impact of data volume on the performance of Spark based data analytics in scale-up configuration is not...
متن کامل